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ABSTRACT 

This study summarizes recent achievements in the 
expanding development of man/machine communications and reviews 
current technological hurdles associated with the development of 
artificial intelligence systems vhich can generate and teco^nize 
httman speech patterns^ With the development of such systems^ one 
potential application would be the establishment of machine-assisted 
reading centt^rs^ permitting significantly increased Individualized 
reading instruction sifRiiar to the technigues em in modern 

lan^Utge laboratories to supplement classroom instruction* The 
Computer Assisted Reading Educational System (CARES) is proposed as 
model for a reading laboratory and described in terms of flow 
diagrams, system response parameters, input/output displays and 
devices, estimates of regulred machine size# system cost# and time 
development, (Author/RB) 
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Interactive Relationships with Computers 
in Teaching Reading 
Rene M» Doublier 

I, Introduction 

' Coincident with the widespread introduction of automatic com- 
putational machinery (around twenty years ago)^ a new science. came 
into being whose fundamental goal was to develop improvements in 
the techniques of man/machine communication. It soon became 
obvious to the '^computer scientist" that many potential users of 
digital processors were thwarted by the difficulties associated with 
mastering the very difficult, low-level, machine -language program- 
ming techniques then available, and so (with invaluable assistance 
from linguists and grammarians), programming system g^rchltects 
set out upon the development of near-natural programming languages 
which have since become known under such popular mnemonic titles 
as '^Fortran, '^Algol, "Cobol, etc. Yet, even with these vastly 
simplified translators, the ultimate means of communication with 
machines remained an elusive goal. 

"If computers could only speak their answers 
as well as display them graphically* . • " 

"If computers could only hear^nd undet^stand 
spoken requests as well as those ma^de on typewriters, 
punched cards, or magnetic tape* . . " 
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The first of these capabilities is commonly referred to as 
computer-generated or "synthesized" speech, whereas the second 
(and, incidently, by far the more difficult task) is usually termed 
automatic speech recognition. 

The applications of such a system are too numerous and varied 
to attempt a complete listing in the present paper. Rather, it is 
our purpose to acquaint the reader with the fundamental technical 
problems currently limiting the "state of the art" in man/machine 
communications, and to present a possible model for a specialized 
application, namely a "Computer-Assisted Reading Educational 
System" (CARES). 

It is hoped that, in the course of this exposition, some of the 
problems facing the computer scientist in teaching a machine to 
speak and to understand human speech will be recognized as being 
common to the teacher/human pupil situation. 
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IL Some Fundamental Linguistic Concepts . 

Speech is a form of communication between human beings which 
involves the generation and reception of a rather complex acoustic 
signal, The first major decision facing an architect of a man/ machine 
communication system is that of establishing the fundamental lin- 
guistic units which are either to be converted to acoustic energy 
(synthesized) or extracted from a received waveform (recognized). 

The most familiar language units are^ of course, words. It 
might seem natural then to propose that our strategy employ the 
stored characteristics of a table of words, representative of the 
base language. However, the awesome size of most unabridged 
dictionaries of, for example^ the English language suggests that 
inordinately large amounts of computer memory capacity would be 
required to tabular! ze the characteristics of words and their various 
derivatives. 

Closer examination of the acoustic properties of human speech 
reveals that we can express the elemental sounds of a language as 
a finite set of discrete symbols, commonly called ^'segmental 
phonemes. 

Naively then, it would appear that a far simpler and more 
efficient linguistic representation could be achieved by bpecifylng 
the language at the phonemic levels and that speech could be 
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produced (or recogmzed) by a simple concatenation (or segmentation) 

these discrete units. Unfortunately, three significant problems 
associated with the characteristics of human speech severely compli- 
cate the segmental phoneme approach. 

First of all, continuous speech is not particularly amenable 
to discrete analysis. The acoustic properties of one phoneme are 
not isolated and readily segmented from those of the surrounding 
phonemes. Thus, it is difficult to establish phoneme boundaries, 
since the characteristics of one phoneme often glide continuously 
into those of the following phoneme* This, of course, would severely 
complicate automatic (machine) recognition procedures based solely 
on phonemic segmentation,' ■ 

Secondly, these "characteristic^V acoustic propertie s are 
highly context sensitive/ That is, the properties of a given phoneme 
vary according to the linguistic and acoustic properties of the sur- 
rounding phonemes, 

Finally, there are significant variations in speech patterns, 
not or>ly from one speaker to the next, but also in identical phrases 
spoken by a uingle speaker on different occasions* 

The trade-off is therefore evident. Either we require vast 
amounts of memory to store our library of larger speech units 
(wot-ds, phrases, sentences, etc.), or we must develop and 
efficiently program a complex set of context- and speaket-^depeiidehi 
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our smaller (phonemic) elements, Until computer science develops 
more advanced and cost-effective techniques of storing and rapidly 
retrieving the staggering quantity of data required to characterii'.e 
a diverse vocabulary of words, phrases, or sentences, it would 
appear that the only feasible technical approach to the synthesis and 
recognition problems would employ phonemes as basic linguistic 
units* Once that decision has been made, the next fundamental 
question to be considered relates to the order of complexity of the 
phoneme modifier rules. 

Research completed at the University of Southern Galiforhia's 
Acoustic Phonetics and Hybrid Computation Laboratories ha a 
yielded a sat of phoneme characteristics and context-modifier rules 
for General American English which have been compactly programmed 
in computers of only modorate size, (1) Employing these charac- 
teristics and rules, in coajunction with a Terminal Analog Speech 
Synthesizer, researchers have produced artificially -generated 
speech of high intelligibility and reasonable natural quality. 

While satisfactory solutions to the synthesis problem have 
been achieved utilizing phonemic-level analysis, the previously 
mentioned problems of phoneme boundary identification and human , - 
speaker variations have severely restricted progt^ess in the area 
of automatic speech recognition, 
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ni. SQme Additional Thoughts on the Recognition Problem 

:In designing an algorithm for machine recognition of human 
speech, : we might hope to pattern our strategy after our best under- 
standing of how the human brain decodes the speech waveform» 
Unfortunately, the actual mechanism of human perception and recog- 
nition of speech are only rather poorly understood. For example, 
little is known about the temporal span oif the recognition unit. Some 
researchers argue that the phonemic levelisthe most likely, while 
Others propose that the recogni^on process is accomplished at hlghe 
linguistic levels, such as syllables, or even possibly words. Since 
we have concluded that speech is a random process which shows 
strong evidence of being non-stationa^^y, it Is small wonder that 
past attempts at automatic machine recognition of human speecli 
have met with only very limited success, This is primarily a result 
of the fact that these experiments have been based, to a large extent, 
on acoustic pattern recognition, and have made only restricted 
usage of other available cues and linguistic constraints. 

The evidence appears to be mounting that a generalized recog- 
nition algorithm {that is, one which is speaker-independent and 
which assumes no a priori knowledge of spoken text) will be based 
not only upon the recognition of acoustical patterns of elemental 
sound units (for example, phonemes), but will also require higher 
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arguments Would lead one to conclude that, at least on some occa- 
sionst it will be necessary to employ the more powerful techniques 
of analysis at higher linguistic levels. It is th-3 research and 
develophient of this minimal set of rules which is the principal i 
stumbling block facing speech researchers. 

The foregoing would seem to paint a rather bleak picture for 
the immediate potential of interactive (man/machine) communica- 
tions. However, if we now focus our attention on a particular 
application of this capability, some simplifying assumptions may be 
made which will significantly reduce the magnitude of the techno- 
logical problems which must be hurdled* 

rV, Computer'^Assistod Reading Instruction; The Model 

One of the fundamental problems facing our educational system 
is that of providing adequate individualized reading Instruction. 

'^Classes containing thirty-five to forty students 
are, unfortunately, numerous. Teachers of reading 
in these classrooms complain that they cannot do the 
job* They mean that they cannot find time for thorough 
ongoing diagnosis and individual programs for the 
children who need individual help* I* (2) 

Furthermore^ current economic pressures on educational fuiiding 
are aggravating rather than alleviating this ptoblem* 

It would appear then that> just as our universities and colleges 
'H|ive cleveloptfd isl'stem to permit individualized practice ifr tlie 



'\ instruction. Tho following is a brief description of some of the 

I essential and desirable features of a computer-based system for 

■ -reading, instruction^ .■ .'. ■ 

(1) Pupil Terminals - The pupil terminal includes the interface 
equipment for displaying written text (either in phonic or lexical 
form)j receiving and converting the pupil's spoken responses into 
digital data, and the digital-to-analog/ synthetic voice tract equip- 

. ment necessary to produce artificial speech. The graphic display 

woul^ be best implemented using a cathode ray tube system, with 
the additional features of a movable cursor to pace the student, an 
electrostatic pointer to permit the student to synchronize the com- 
puter to his position in the text, and a phonic-^qulvalent display above 
the lexical form of the word. 

(2) Instructional Feedback^ The system must have the capability 
of providing corrective instructions to the student. This, of course, 
implies that the system must be capable of not only generating high- 
quality, natural-sounding speech, but also of Identifying error 
patterns or difficulties in pronounciatlon in the student's spoken 
text. Note however that this "speech recognition" requirement is 
a considerably simplified version of that described in earlier sec- 
tions of this paper, since the computer may be programrned to have 
a priori knowledge, of the text, The rd&dgnitlon task is thus reduced 

iiiiiiii?!^^ 
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"pronounciation accuracy" threshold, the computer will issue 
corrective instructions as both graphic and audible corrective 
responses (phonic, word, and/or phrase). 

System Response Time - The system must operate in as near 
"real time" as possible. Significant delays in input/output proces- 
sing would prove disastrous In a reading instruction environment. 
Using at* a basis the aforementioned research performed at the 
University of Southern GaUfornia (1), it is estimated that a small, 
special-purpose processor, with 32K main-memory capacity, and 
one to two microsecond memory cycle time would provide sufficient 
computational power and speed. 

Cost and Development Lead-Time - Excluding initial develop- 
ment costs, individual pupil terminals (including central computer 
processing) could be delivered (in quantities of fifty pupil terminals) 
at a cost of around $35, 000 per unit. Although this cost may appear 
excessive, it should be noted ^hat the useful life of the system can b 
reasonably estimated to exceed five years. The development lead- 
time of the CARES system, assuming a significant commitment on 
the. part of a major computer manufacturer would be less than five 
years. 

IV. Some Final Thoughts 

With evei-y da^' thW passes, ou¥ way of life becomes more 
artd'»^'^Wdep6hdent up6n computers, whether they bo lh« large 
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special purpose devices that perform routine accounting and con- 
trolling tasks. And yet, although these machines touch all of our 
lives in one way or ..nother. their fuUeat potential to serve mankind 
will be realized only when we develop methods of teaching computers 
to communicate in the mann.r most natural to us. namely spok,>n 
language/ 
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